seed query
- Europe > Italy > Lombardy > Milan (0.04)
- North America > United States (0.04)
- Europe > Monaco (0.04)
- Europe > Austria (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
- North America > United States (0.04)
- Europe > Monaco (0.04)
- Europe > Austria (0.04)
Active Learning of Classifiers with Label and Seed Queries (Supplementary Material)
A.3 Pseudocode of CPLearn and full proof of Theorem 10 We present CPLearn and prove Theorem 10. There are two main obstacles in implementing this process. Let us now turn to the invariants. As shown in Bressan et al. [2021a], this implies Now let us turn to CPLearn. This proves the second invariant.
NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge
Alam, Firoj, Hasan, Md Arid, Laskar, Sahinur Rahman, Kutlu, Mucahid, Darwish, Kareem, Chowdhury, Shammur Absar
The rapid advancement of large language models (LLMs) has raised concerns about cultural bias, fairness, and their applicability in diverse linguistic and underrepresented regional contexts. To enhance and benchmark the capabilities of LLMs, there is a need to develop large-scale resources focused on multilingual, local, and cultural contexts. In this study, we propose the NativQA framework, which can seamlessly construct large-scale, culturally and regionally aligned QA datasets in native languages. The framework utilizes user-defined seed queries and leverages search engines to collect location-specific, everyday information. It has been evaluated across 39 locations in 24 countries and in 7 languages -- ranging from extremely low-resource to high-resource languages -- resulting in over 300K Question-Answer (QA) pairs. The developed resources can be used for LLM benchmarking and further fine-tuning. The framework has been made publicly available for the community (https://gitlab.com/nativqa/nativqa-framework).
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > Middle East > Yemen > Amanat Al Asimah > Sanaa (0.04)
- (45 more...)
Xpose: Bi-directional Engineering for Hidden Query Extraction
Pradhan, Ahana, Haritsa, Jayant
Query reverse engineering (QRE) aims to synthesize a SQL query to connect a given database and result instance. A recent variation of QRE is where an additional input, an opaque executable containing a ground-truth query, is provided, and the goal is to non-invasively extract this specific query through only input-output examples. This variant, called Hidden Query Extraction (HQE), has a spectrum of industrial use-cases including query recovery, database security, and vendor migration. The reverse engineering (RE) tools developed for HQE, which are based on database mutation and generation techniques, can only extract flat queries with key-based equi joins and conjunctive arithmetic filter predicates, making them limited wrt both query structure and query operators. In this paper, we present Xpose, a HQE solution that elevates the extraction scope to realistic complex queries, such as those found in the TPCH benchmark. A two-pronged approach is taken: (1) The existing RE scope is substantially extended to incorporate union connectors, algebraic filter predicates, and disjunctions for both values and predicates. (2) The predictive power of LLMs is leveraged to convert business descriptions of the opaque application into extraction guidance, representing ``forward engineering" (FE). The FE module recognizes common constructs, such as nesting of sub-queries, outer joins, and scalar functions. In essence, FE establishes the broad query contours, while RE fleshes out the fine-grained details. We have evaluated Xpose on (a) E-TPCH, a query suite comprising the complete TPCH benchmark extended with queries featuring unions, diverse join types, and sub-queries; and (b) the real-world STACK benchmark. The experimental results demonstrate that its bi-directional engineering approach accurately extracts these complex queries, representing a significant step forward with regard to HQE coverage.
- Europe > France (0.04)
- Asia > China (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (9 more...)
Active Learning of Classifiers with Label and Seed Queries
Bressan, Marco, Cesa-Bianchi, Nicolò, Lattanzi, Silvio, Paudice, Andrea, Thiessen, Maximilian
We study exact active learning of binary and multiclass classifiers with margin. Given an $n$-point set $X \subset \mathbb{R}^m$, we want to learn any unknown classifier on $X$ whose classes have finite strong convex hull margin, a new notion extending the SVM margin. In the standard active learning setting, where only label queries are allowed, learning a classifier with strong convex hull margin $\gamma$ requires in the worst case $\Omega\big(1+\frac{1}{\gamma}\big)^{(m-1)/2}$ queries. On the other hand, using the more powerful seed queries (a variant of equivalence queries), the target classifier could be learned in $O(m \log n)$ queries via Littlestone's Halving algorithm; however, Halving is computationally inefficient. In this work we show that, by carefully combining the two types of queries, a binary classifier can be learned in time $\operatorname{poly}(n+m)$ using only $O(m^2 \log n)$ label queries and $O\big(m \log \frac{m}{\gamma}\big)$ seed queries; the result extends to $k$-class classifiers at the price of a $k!k^2$ multiplicative overhead. Similar results hold when the input points have bounded bit complexity, or when only one class has strong convex hull margin against the rest. We complement the upper bounds by showing that in the worst case any algorithm needs $\Omega\big(k m \log \frac{1}{\gamma}\big)$ seed and label queries to learn a $k$-class classifier with strong convex hull margin $\gamma$.
- Europe > Italy > Lombardy > Milan (0.04)
- North America > United States (0.04)
- Europe > Monaco (0.04)
- Europe > Austria (0.04)